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The parton distribution functions (PDFs) are a non-negotiable input to almost all theory predic- 
tions at hadron colliders. In this talk, I introduce PDF determination by global analysis and discuss 
selected topics concerning recent relevant data from HERA and the Tevatron, before giving some 
prospects for the LHC. The combination of HI and ZEUS cross sections reduces uncertainties 
and will be an important input to future global PDF analyses. The theoretical description of the 
heavy-quark contribution to structure functions at HERA has a significant influence on predic- 
tions at the LHC. New W and Z data from the Tevatron Run II provide important PDF constraints, 
but there are currently problems describing the latest data on the lepton charge asymmetry from 
W — > Iv decays. The Tevatron Run II jet production data prefer a smaller high-x gluon than the 
previous Run I data, which impacts on predictions for Higgs cross sections at the Tevatron. It 
is now possible to consistently calculate a combined "PDF+as" uncertainty on hadronic cross 
sections, which is around 2-3% for the W and Z total cross sections at the LHC, reflecting their 
potential as a "standard candle" to measure machine luminosity. Parton luminosity functions are 
useful quantities for studying properties of hadronic cross sections. Precision measurements at 
the LHC will provide further constraints on PDFs as data accumulates in the early running period. 
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1. Introduction 

Protons are not elementary particles: they are made of partons (quarks and gluons). Parton 
distribution functions (PDFs) are therefore essential to relate theory to experiment at HERA, the 
Tevatron and the LHC [jl]]. Each PDF f u /A{ x : Q 2 ) intuitively gives the number density of partons a 
in a hadron A with momentum fraction x at a hard scale Q 2 3> Aq CD . The "standard" perturbative 
QCD framework at hadron colliders is fixed-order collinear factorisation, which holds up to for- 
mally power-suppressed ("higher-twist") terms i^(Aq CD /<2 2 ). A hadronic cross section Oab can 
be written as a sum of partonic cross sections a a b, each expanded as a perturbative series in the 
running strong coupling CCs(Q 2 ), convoluted with a PDF for each hadron, i.e. 

&ab = I [o^ + a s a^ + ajdT^ + --]®falA{xa,Q 2 )®f b iB{x b ,Q% d-D 

a,b=q,g 

where ® indicates a convolution over the momentum fraction x a ,b- The scale dependence of the 
PDFs is given by the Dokshitzer-Gribov-Lipatov-Altarelli-Parisi (DGLAP) evolution equations: 

lj "<I A _ a S V TP LO J- ry P NLO _1_ rv 2 pNNLO , 1 <> f n ^ 

dlnO 2 ~ 2% \- F ^'+ a S^aa' + a S F aa> + ---\®Ja'/A, U-2) 

^ a'=1,g 

while the running of (Xs(Q 2 ) satisfies the renormalisation group equation. However, the input 
values f a / A (x, Qfy and OCs(Qq) to the evolution equations are incalculable by perturbative QCD 
and so need to be extracted from data. Structure functions in deep-inelastic scattering (DIS) can 
similarly be written in terms of perturbative coefficient functions, Q a , convoluted with PDFs, i.e. 

^•M 2 )= I [c]£ + oc s C^ + alc^ + ...]®f a/A . (1.3) 

Since the PDFs are universal, they can be determined from a wide range of existing data, for 
example, from the HERA ep collider (HI and ZEUS experiments), from fixed-target experiments 
in tp and id scattering (BCDMS, NMC, E665, SLAC), vN scattering (CCFR, NuTeV, CHORUS), 
pp and pd scattering (E866/NuSea), together with pp collider data from the Tevatron (CDF, D0). 

The paradigm for PDF determination by "global analysis" is to parameterise the x dependence 
of f a / A (x, <2q) for each flavour a = q,g at the input scale £?q ~ 1 GeV 2 in some flexible form, 
subject to number- and momentum-sum rule constraints. The PDFs are then evolved to higher 
scales Q 2 > Qfi using the DGLAP evolution equations. The evolved PDFs are convoluted with C !;fl 
or a a b to calculate theory predictions corresponding to a wide variety of data. The input parameters 
are then varied to minimise a global goodness-of-fit measure (x 2 )- 

The determination of parton distributions by global analysis has been an "industry" for more 
than 20 years, with regular updates as new data and new theory become available. The first NLO 
fit was done by the Martin-Roberts-Stirling group (1987), later joined by Thorne (1998), until 
the retirement of Roberts (2005) and the addition of G.W. (2006). The previous "MRST" fits 
have recently been superseded by the "MSTW 2008" (LO, NLO and NNLO) fits [§]. The other 
major group is "CTEQ" (Coordinated Theoretical-Experimental Project on QCD), and the latest 
public fits are CTEQ6L1 at LO CTEQ6.6 at NLO [§], while a NNLO fit is still forthcoming. 
Other groups generally fit a more restricted range of data with fewer free parameters [BL ^, [7]]. 
The NNPDF Collaboration H |9|] use an interesting alternative approach to determine PDFs from a 
neural network parameterisation to avoid bias due to a particular functional form of the input. 
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2. HERA 

Existing HERA data already provide one of the most important inputs to global PDF analy- 
ses, especially in providing a strong constraint on the small-* gluon and sea-quark distributions. 
The separate HI and ZEUS inclusive cross-section measurements in DIS have recently been com- 
bined to improve accuracy The averaging procedure gives a large reduction in the correlated 
systematic uncertainties due to the different properties of the two detectors, leading effectively to 
"cross-calibration" of the systematic uncertainties between the two experiments. These new HERA 
combined data will prove invaluable in the next generation of global fits. A PDF fit only to these 
HERA data has already been performed |7|], but with only 10 free PDF parameters used to deter- 
mine the central fit compared to, for example, 28 free PDF parameters for the "MSTW 2008" fit, 
reflecting the incomplete flavour separation provided by fitting only to the HERA inclusive data. 

Heavy quarks, particularly charm, can contribute a sizeable amount to the total DIS structure 
function Fi. There are two well-defined regions in which to calculate the heavy-quark contribu- 
tion. In infixed flavour number scheme (FFNS), valid for Q 2 < m\, the heavy-quark mass (m#) is 
retained in the calculation of the hard-scattering coefficient function, and there is no heavy-quark 
PDF which would resum asln(<2 2 / 'mjj) terms in a similar way as for light quarks. In the zero-mass 
variable flavour number scheme (ZM-VFNS), valid for Q 2 S> m\, this heavy-quark PDF is intro- 
duced, and the mass dependence is neglected in the coefficient function. A general-mass VFNS 
(GM-VFNS) interpolates between these two well-defined limits, using a FFNS for Q 2 < m 2 H and 
a ZM-VFNS for Q 2 S> m 2 H , although there are ambiguous ff(mj 1 /Q 2 ) terms in the intermediate 
region of Q 2 > m 2 H . The calculation of W and Z cross sections at the LHC is directly influenced 
by the treatment of heavy quarks in DIS, since the relevant sea-quark PDFs are determined from 
HERA data. The change from CTEQ6.1 (ZM-VFNS) to CTEQ6.5 (GM-VFNS) gave a 8% in- 
crease in Ow,z at the LHC. The MRST group have used a GM-VFNS since 1998, but the change 
from MRST 2004 to MRST 2006 introduced the first precise definition of a GM-VFNS at NNLO, 
including in particular the (correct) discontinuities in the NNLO PDF evolution at Q 2 = mjj, lead- 
ing to a 6% increase in a w ,z at the LHC. Pre-2006 MRST NNLO (but not NLO) PDF sets should 
therefore now be considered obsolete due to the incomplete heavy-flavour treatment. 

Heavy-flavour structure function data from HERA are reaching impressive precision, partic- 
ularly for the charm structure function F 2 CC , where the separate HI and ZEUS measurements have 
also been combined using the same procedure as for the inclusive cross sections. For both charm 
and beauty structure functions, there is good agreement between the data and theoretical predic- 
tions using different varieties of GM-VFNS, with the F 2 CC data having some discriminating power at 
the lowest Q 2 values (but still Q 2 > m 2 ), where the GM-VFNS predictions exhibit the largest vari- 
ation. The systematic uncertainty in the particular choice of GM-VFNS, and its effect on hadronic 
cross sections, remains to be fully quantified, but work is in progress towards achieving this goal. 

The longitudinal proton structure function, Fl, was measured at HERA using data taken in 
the last few months of running in 2007 when the proton beam energy was lowered. The NLO and 
NNLO calculations tend to undershoot the HERA data at the lowest Q 2 and x values where the 



theory predictions are perturbatively unstable, while small-x resummation [10] aids the descrip- 
tion. Small-x resummation will be important at the LHC, for example, in low-mass Drell-Yan 
production, where the fixed-order theory predictions are also seen to be perturbatively unstable. 
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3. Tevatron 

Data from the Tevatron Run II are playing an increasing role in global PDF fits. The primary 



types of data used are the Z rapidity distributions Qll , |12| ], included for the first time in the "MSTW 



2008" fit the W — > £v charge asymmetry, and cross sections for inclusive jet production. 

3.1 W — > iv charge asymmetry 

The W charge asymmetry at the Tevatron, as a function of the W rapidity (yw), is 

da(W + )/dy w - da(W~)/dy w _ u(x l )d(x 2 ) - d(x l )u(x 2 ) 
W[yw> ~ da(W+)/dy w + da(W-)/dy w ~ u{x i )d(x 2 ) + d{x l )u(x 2 Y 

where x\^ 2 = (Mw / exp(±y^), constraining mainly the d/u ratio of the proton PDFs. However, 
experimentally, the W rapidity cannot be directly reconstructed since the longitudinal momentum of 
the decay neutrino is, in general, unknown. Therefore, the quantity which is traditionally measured 
instead is the lepton charge asymmetry, as a function of the lepton pseudorapidity r\i, i.e. 

a /„ \ da(l+)/dT7, - dq(r)/dT7, 

M71i) ~ da(^)/dife + da(*-)/dife (3 } 



Global PDF fits have previously used Tevatron Run I data on A { [J13J. The MSTW 2008 fit [|2J] 
was the first to instead use Run II data [|14|, 15], provided in two Ef bins for the case of CDF 



data on A e [0]. The latest D0 data on A e [|T|] and [|T7|] are badly described by current NLO 
PDFs, especially for pj > 35 GeV, while refitting the PDFs causes tension with a number of other 
data sets, although this tension is reduced with modified deuteron corrections. It is not possible to 



describe both the D0 A e fllq ] and A^ ^TT\\ data simultaneously. The effect of NNLO corrections [JT8 



19] (or /7^-resummation, as implemented in RESBOS) is small, but acts in the right direction. 



CDF have recently determined Aw Q20| ] using a new technique to obtain the neutrino's longitudinal 
momentum by constraining the £v mass to M w . The MSTW 2008 PDFs using VRAP [|l|] give a 
good description (better than the previous MRST 2006 PDFs) of the CDF Aw data, while modified 
fits to the new D0 At data [0, [l7|] tend to undershoot the CDF Aw data. Before the new precise 
At data can be usefully included in global PDF fits, more work is needed to qualify and resolve the 
apparent discrepancies between (i) CDF and D0 data, (ii) A e and A^ data, and (iii) data and theory. 

3.2 Inclusive jet production 

The Tevatron Run I data on inclusive jet production were included in global PDF fits up to 
MRST 2006 (and the current CTEQ6.6) as an important constraint on the high-x gluon distribution. 



The MSTW 2008 analysis [g] was the first to include Run II jet data [J22J, |23|], finding a preference 
for a smaller gluon distribution at high x than that obtained with the previous Run I data. Fitting 
only to Run I jet data gives a bad description of Run II jet data, and vice versa, while fitting neither 
gives a similar description as only fitting Run II jet data. Some similar findings have been made 



by the CTEQ group [24], although with a little less discrepancy and change in gluon. There is 
therefore some apparent inconsistency between the Run I and Run II jet data, while the Run II jet 
data are slightly more consistent with the rest of the data included in the global fit. The final MSTW 
2008 analysis therefore dropped the Run I jet data from the fit. There is only a slight change in the 
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gluon if the CDF Run II data obtained using the kj jet algorithm [22] are replaced by the CDF Run 
II data obtained using the cone-based Midpoint jet algorithm [^5|]. The new smaller high-* gluon 
is also preferred by the D0 Run II dijet mass spectrum [|26|], especially at high rapidities, where the 
data prefer MSTW 2008 over CTEQ6.6. 

The NNLO trend is similar to NLO, with the caveat that the exact NNLO jet cross sections are 
unavailable, so 2-loop threshold corrections are used instead. The smaller high-* gluon (and smaller 
a$) in MSTW 2008 compared to MRST 2006 means that the predicted Higgs cross sections at the 
Tevatron are also smaller. The MSTW 2008 NNLO PDFs were used for the Tevatron exclusion 
results from last March [^7|] and last November whereas previous results used MRST 2002 
NNLO PDFs, which fit Tevatron Run I jet data and also had an incomplete heavy-flavour treatment. 

4. LHC 

It is common to determine the strong coupling a$ at the same time as the PDFs. For example, 
the MSTW 2008 NNLO analysis obtained a s (M|) = 0.1171 ±0.0014 from only experimental un- 
certainties [p9|], with an additional theory uncertainty (< 0.003), cf. the Particle Data Group world 
average value of 05$ (M|) = 0.1176 ±0.002. The same value of a$ should be used in subsequent 
cross-section calculations. However, since the PDFs and a$ are correlated, the uncertainty on a 
hadronic cross section due to both PDFs and as cannot simply be obtained by adding the two sepa- 
rate uncertainties in quadrature. A prescription has recently been developed [^] to allow consistent 
calculation of the combined "PDF+as" uncertainty on a hadronic cross section. The additional un- 
certainty due to as is particularly important for processes where multiple powers of as appear at 
lowest-order, such as Higgs production via gluon-gluon fusion or inclusive jet production, both of 
which enter at ff(af) at the LHC. 

The W and Z total cross sections at the LHC are a potential "standard candle" for determina- 
tion of the machine luminosity. The NNLO total cross sections using MSTW 2008 NNLO PDFs 
have a "PDF+as" uncertainty of around 2-3%, while the additional uncertainty from varying the 
renormalisation and factorisation scales is less than 1%. Dependence on other theoretical uncer- 
tainties, such as heavy-quark masses and the specific choice of GM-VFNS used in the PDF fit, is 
currently under study. Most uncertainties largely cancel in the W /Z and W + /W~ ratios. 

The parton luminosity function, d^f a b/dM^, can be interpreted as the appropriate convolution 
of PDFs for production of a final state with invariant mass M\ from initial-state partons a and b. 
It proves very useful when studying properties of hadronic cross sections, for example, the PDF 
uncertainty or the dependence on different LHC beam energies [ |30| , pi] ]. 

Of course, as data begins accumulating at the LHC, precision measurements will provide fur- 
ther constraints on PDFs. In particular, measurement of low-mass Drell-Yan production at high 



rapidity by LHCb [ |32| , |33J may extend the small-x reach of HERA, although as already noted, 
useful inclusion in PDF fits may require small-x resummation. 

The importance of PDFs can only increase now that we have firmly entered the LHC era. 
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